2,508 research outputs found
Modeling Data Reuse in Deep Neural Networks by Taking Data-Types into Cognizance
In recent years, researchers have focused on reducing the model size and
number of computations (measured as "multiply-accumulate" or MAC operations) of
DNNs. The energy consumption of a DNN depends on both the number of MAC
operations and the energy efficiency of each MAC operation. The former can be
estimated at design time; however, the latter depends on the intricate data
reuse patterns and underlying hardware architecture. Hence, estimating it at
design time is challenging. This work shows that the conventional approach to
estimate the data reuse, viz. arithmetic intensity, does not always correctly
estimate the degree of data reuse in DNNs since it gives equal importance to
all the data types. We propose a novel model, termed "data type aware weighted
arithmetic intensity" (), which accounts for the unequal importance of
different data types in DNNs. We evaluate our model on 25 state-of-the-art DNNs
on two GPUs. We show that our model accurately models data-reuse for all
possible data reuse patterns for different types of convolution and different
types of layers. We show that our model is a better indicator of the energy
efficiency of DNNs. We also show its generality using the central limit
theorem.Comment: Accepted at IEEE Transactions on Computers (Special Issue on
Machine-Learning Architectures and Accelerators) 202
DeepReShape: Redesigning Neural Networks for Efficient Private Inference
Prior work on Private Inference (PI)--inferences performed directly on
encrypted input--has focused on minimizing a network's ReLUs, which have been
assumed to dominate PI latency rather than FLOPs. Recent work has shown that
FLOPs for PI can no longer be ignored and have high latency penalties. In this
paper, we develop DeepReShape, a network redesign technique that tailors
architectures to PI constraints, optimizing for both ReLUs and FLOPs for the
first time. The {\em key insight} is that a strategic allocation of channels
such that the network's ReLUs are aligned in their criticality order
simultaneously optimizes ReLU and FLOPs efficiency. DeepReShape automates
network development with an efficient process, and we call generated networks
HybReNets. We evaluate DeepReShape using standard PI benchmarks and demonstrate
a 2.1\% accuracy gain with a 5.2 runtime improvement at iso-ReLU on
CIFAR-100 and an 8.7 runtime improvement at iso-accuracy on
TinyImageNet. Furthermore, we demystify the input network selection in prior
ReLU optimizations and shed light on the key network attributes enabling PI
efficiency.Comment: 37 pages, 23 Figures, and 17 Table
E2GC: Energy-efficient Group Convolution in Deep Neural Networks
The number of groups () in group convolution (GConv) is selected to boost
the predictive performance of deep neural networks (DNNs) in a compute and
parameter efficient manner. However, we show that naive selection of in
GConv creates an imbalance between the computational complexity and degree of
data reuse, which leads to suboptimal energy efficiency in DNNs. We devise an
optimum group size model, which enables a balance between computational cost
and data movement cost, thus, optimize the energy-efficiency of DNNs. Based on
the insights from this model, we propose an "energy-efficient group
convolution" (E2GC) module where, unlike the previous implementations of GConv,
the group size () remains constant. Further, to demonstrate the efficacy of
the E2GC module, we incorporate this module in the design of MobileNet-V1 and
ResNeXt-50 and perform experiments on two GPUs, P100 and P4000. We show that,
at comparable computational complexity, DNNs with constant group size (E2GC)
are more energy-efficient than DNNs with a fixed number of groups (FGC). For
example, on P100 GPU, the energy-efficiency of MobileNet-V1 and ResNeXt-50 is
increased by 10.8% and 4.73% (respectively) when E2GC modules substitute the
FGC modules in both the DNNs. Furthermore, through our extensive
experimentation with ImageNet-1K and Food-101 image classification datasets, we
show that the E2GC module enables a trade-off between generalization ability
and representational power of DNN. Thus, the predictive performance of DNNs can
be optimized by selecting an appropriate . The code and trained models are
available at https://github.com/iithcandle/E2GC-release.Comment: Accepted as a conference paper in 2020 33rd International Conference
on VLSI Design and 2020 19th International Conference on Embedded Systems
(VLSID
Characterizing and Optimizing End-to-End Systems for Private Inference
Increasing privacy concerns have given rise to Private Inference (PI). In PI,
both the client's personal data and the service provider's trained model are
kept confidential. State-of-the-art PI protocols combine several cryptographic
primitives: Homomorphic Encryption (HE), Secret Sharing (SS), Garbled Circuits
(GC), and Oblivious Transfer (OT). Today, PI remains largely arcane and too
slow for practical use, despite the need and recent performance improvements.
This paper addresses PI's shortcomings with a detailed characterization of a
standard high-performance protocol to build foundational knowledge and
intuition in the systems community. The characterization pinpoints all sources
of inefficiency -- compute, communication, and storage. A notable aspect of
this work is the use of inference request arrival rates rather than studying
individual inferences in isolation. Prior to this work, and without considering
arrival rate, it has been assumed that PI pre-computations can be handled
offline and their overheads ignored. We show this is not the case. The offline
costs in PI are so high that they are often incurred online, as there is
insufficient downtime to hide pre-compute latency. We further propose three
optimizations to address the computation (layer-parallel HE), communication
(wireless slot allocation), and storage (Client-Garbler) overheads leveraging
insights from our characterization. Compared to the state-of-the-art PI
protocol, the optimizations provide a total PI speedup of 1.8, with the
ability to sustain inference requests up to a 2.24 greater rate.Comment: 12 figure
ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks
The capability of the self-attention mechanism to model the long-range
dependencies has catapulted its deployment in vision models. Unlike convolution
operators, self-attention offers infinite receptive field and enables
compute-efficient modeling of global dependencies. However, the existing
state-of-the-art attention mechanisms incur high compute and/or parameter
overheads, and hence unfit for compact convolutional neural networks (CNNs). In
this work, we propose a simple yet effective "Ultra-Lightweight Subspace
Attention Mechanism" (ULSAM), which infers different attention maps for each
feature map subspace. We argue that leaning separate attention maps for each
feature subspace enables multi-scale and multi-frequency feature
representation, which is more desirable for fine-grained image classification.
Our method of subspace attention is orthogonal and complementary to the
existing state-of-the-arts attention mechanisms used in vision models. ULSAM is
end-to-end trainable and can be deployed as a plug-and-play module in the
pre-existing compact CNNs. Notably, our work is the first attempt that uses a
subspace attention mechanism to increase the efficiency of compact CNNs. To
show the efficacy of ULSAM, we perform experiments with MobileNet-V1 and
MobileNet-V2 as backbone architectures on ImageNet-1K and three fine-grained
image classification datasets. We achieve 13% and 25%
reduction in both the FLOPs and parameter counts of MobileNet-V2 with a 0.27%
and more than 1% improvement in top-1 accuracy on the ImageNet-1K and
fine-grained image classification datasets (respectively). Code and trained
models are available at https://github.com/Nandan91/ULSAM.Comment: Accepted as a conference paper in 2020 IEEE Winter Conference on
Applications of Computer Vision (WACV
ROLES OF SCHOOL DISTRICT COMPETITION AND POLITICAL INSTITUTIONS IN PUBLIC SCHOOL SPENDING AND STUDENT ACHIEVEMENT
Equity in school district spending, and equity and productive efficiency in educational outcomes are of paramount importance in the literature on K-12 public education in the US. The research on the effects of school choice (operationalized as inter-school district competition) and local political institutions on unequal school district spending and equity and productive efficiency in educational outcomes is not adequate. This dissertation fills several gaps in the literature by 1) extending the literature on the Public Choice, the Leviathan, the Consolidated Local Government, and the Reformism models that examines the interactive roles of local political institutions and school choice on equity in spending, productive efficiency and equity in student achievement in public schools in metropolitan areas; and 2) modeling the equity effects of school choice and political institutions on school district spending and student achievement. Fixed effects, instrumental variable fixed effects, Hausman-Taylor regression, and Multilevel Linear regression models are utilized on a uniquely compiled longitudinal dataset from several sources, including the Popularly Elected Officials Survey from the US Census Bureau, the Local Education Agency (School District) Longitudinal Finance Survey, the National Education Longitudinal Study (NELS: 1988-92), and the School District Demographics System from the National Center for Education Statistics.
Results from fixed effects models lend support for interactive effects of political institutions and inter-school district competition on school district spending. Additive and interactive models do not robustly support the equity effects of inter-school district competition on school district spending. However, results from fixed effects and instrumental variable fixed effects models support the equity effects of political institutions on school district spending in some cases. School districts with more professional political institutions are also more equitable in public education spending.
Results show that whereas inter-school district competition has productive efficiency effects on student achievement the political institutions do not. In terms of equity, the inter-school district competition and political institutions have differential effects on student achievement. In regard to the former, results imply that the increased inter-school district competition leads to inequity in students' 10th grade reading scores and 12th grade reading and math scores. In regard to the latter, results suggest that differences in political institutions across school districts lead to inequity in students' 10th and 12th grade reading and math scores. School districts with more professional political institutions also have more equitable student achievement. Student's reading and math scores are generally higher in comparatively higher income quintile school districts than those in comparatively lower income quintile school districts. These findings assume significance as they inform the policymakers in regard to why and how organizational and political contexts matter in bringing desirable educational outcomes. The policymakers can bring organizational and political changes in school districts for achieving the goal of more effective public education
- …